Conference Proceedings
Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation
Xinting Huang, Jianzhong Qi, Yu Sun, Rui Zhang
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020) | ASSOC COMPUTATIONAL LINGUISTICS-ACL | Published : 2020
Abstract
Dialogue policy optimization often obtains feedback until task completion in task oriented dialogue systems. This is insufficient for training intermediate dialogue turns since supervision signals (or rewards) are only provided at the end of dialogues. To address this issue, reward learning has been introduced to learn from state-action pairs of an optimal policy to provide turn-by-turn rewards. This approach requires complete state-action annotations of human-to-human dialogues (i.e., expert demonstrations), which is labor intensive. To overcome this limitation, we propose a novel reward learning approach for semi supervised policy learning. The proposed approach learns a dynamics model as ..
View full abstractGrants
Awarded by Australian Research Council (ARC)
Funding Acknowledgements
We would like to thank Xiaojie Wang for his help. This work is supported by Australian Research Council (ARC) Discovery Project DP180102050, and China Scholarship Council (CSC).